APPLICATION FOR UNITED STATES LETTERS PATENT 



INVENTORS: Sang Ik JUNG and Seok Jin YOON 



TITLE: CACHE FLUSH SYSTEM AND METHOD THEREOF 



ATTORNEYS: FLESHNER & KIM, LLP 

& P. O. Box 221200 

ADDRESS: Chantilly, VA 20153-1200 



DOCKET NO.: SI-0052 



« 

CACHE FLUSH SYSTEM AND METHOD THEREOF 



BACKGROUND OF THE INVENTION 

1. Field of the Invention 

[1] The present invention relates to a cache^flush system and a method thereof. 

2. Background of the Related Art 

[2] In a general processor system, in order to speed up a processor's access to a 
main memory, a cache memory temporarily stores data necessary for the processor. 
Generally, a cache memory maintains management information called a "cache tag" to 
manage whether data stored in cache block of the cache memory is data among data in the 
main memory, whether the data stored in the cache block of the cache memory is changed 
and is in a state having contents that are different from contents of data in the main memory 
(i.e., modified state or dirty state). 

[3] In a multi-processor system including a plurality of processors, a plurality of 
cache memories exist and each cache memory has a snoop mechanism in order to assure 
memory coherency or data coherency. The snoop mechanism checks whether processor bus 
instruction affects data stored in each cache memory, whether the data stored in each cache 
memory should be returned, etc. and makes corresponding data invalid. 

[4] Cache memory includes copy back type and write through type. For the copy 
back type cache memory, which maintains data during certain period without instandy 
reflecting data renewal by a processor in a main memory, it is needed to actively write back 
data changed by a processor, whose contents became different from contents of the main 
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memory, in the main memory. For example, when data stored in cache memory is 
transmitted to an input/ output device not having a snoop mechanism, it is necessary to write 
back. A cache flush algorithm is used to write data, whose contents are changed among data 
stored in a main memory, again in the main memory. Further, a cache block in a dirty state 
is called a dirty block. 

[5] The cache flush algorithm is useful for a fault tolerant or replicant system as 
well as for data transmission to an input/output device not having snoop mechanism. In 
other words, in case of a check point system that restarts process from the previously 
obtained check point when one of prescribed system failures occurs, it is needed to write 
data changed and stored only in cache memory, again in the main memory. 

[6] Generally, the cache flush algorithm is performed on the basis of software 
including cache flush instruction. A processor uses the software to determine whether a 
cache block is a dirty block referring to contents of a cache tag. If the cache block is a dirty 
block, the cache flush algorithm that writes data of the corresponding cache block in main 
memory is performed again. 

[7] As discussed above, a processor should perform cache flush algorithm, that is, 
re-record data in main memory, when a state of cache block is dirty as a result of checking 
states of all cache blocks. A prescribed amount of time is needed to check states of all cache 
blocks. 

[8] A multi-processor system of the related art will be described with reference to 
Figure 1. As shown in Figure 1, the related art multi-processor system includes a plurality of 
processors (CPU, Central Processing Unit) 5 connected to a processor bus, a plurality of 
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cache memories 1 connected to each processor, a memory controller 2 connected to the 
processor bus, a main memory 3 under control of the memory controller 2, and other system 
resources 4 connected to the processor bus. Each of the processors includes cache memory 
1 in one back side of an inside processor and an outside processor or in both back sides. 
Cache memory of an inside processor is called level 1 cache memory and cache memory of 
an outside processor is called level 2 cache memory. 

[9] Each of the processors 5 is connected to each other through a common 
processor bus and is able to get an access to the main memory 3 through the processor bus 
for instruction fetch and loading/ storing data. The access to main memory 3 is generally 
achieved through the memory control unit 2. 

[10] The related art multi-processor system is connected to the other system 
resources 4 such as an input/ output device as well as the above-described basic resources, in 
order to perform specific assigned functions. If 32 bit address and a 64 bit data bus is 
provided, all devices such as the processor 5, the memory controller 2 and the other system 
resources 4 should have same standard interface as that of the processor bus. Further, the 
cache memory 1 of each processor 5 has a configuration based on the standard interface. 

[11] Each of the processors 5 has 32 kilobytes (KB) level 1 instruction cache 
memory and 32 KB level 1 data cache memory inside and a 1 megabyte (MB) level 2 cache 
memory in back side outside. 

[12] An exemplary structure of a level 1 data cache memory will be described with 
reference to Figure 2. The level 1 data cache memory includes tag RAM (Random Access 
Memory) and Data RAM. The level 1 data cache memory implements 8-way set-associative 
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mapping. Each of the 8 cache blocks includes 4 words (W0~W3, respectively 64 bits) and 
address tag (20 bits) corresponding to the 4 words. Further, each cache block has 3 state 
information bits for indicating state of each of the cache blocks, that is, valid state bit, 
modified state bit and shared state bit. 

[13] Further, the level 1 instruction cache memory has the same configuration as 
that of the level 1 data cache memory. However, the level 1 instruction cache memory has 
only a valid state bit as a state information bit. Further, the level 2 cache memory stores data 
and instructions in data RAM and adopts direct mapping 

[14] In the related art, in order to increase efficiency of cache, the level 1 cache 
memory and the level 2 cache memory adopts write back type as write policy. However, 
problems relevant to memory coherence between processors and between a processor and 
an input/ output device can be caused because of the write policy. To manage this, a cache 
controller unit of each processor 5 uses modified/exclusive/shared/invalid (MESI) 
protocol. Figure 3 illustrates a MESI protocol 

[15] As illustrated in Figure 3, a state of the MESI protocol includes a modified 
state, an exclusive state, a shared state and an invalid state, and the state may be expressed by 
combining state information bits of each cache block. The modified state, the exclusive 
state and the shared state are examples of a valid state. Cache flush algorithm is performed 
especially in the modified state and the exclusive state among valid states. 

[16] For example, state information bits of the invalid state are as follows: the valid 
state bit (V) is 0; the modified state bit (M) is 0; and the shared state bit (S) is 0. State 
information bits of the modified state are as follows: the valid state bit (V) is 1; the modified 
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state bit (M) is 1; and the shared state bit (S) is 0. State information bits of the shared state 
are as follows: the valid stated bit (V) is 1; the modified state bit (M) is 0; and the shared 
state bit (S) is 1. State information bits of the exclusive state are as follows: the valid state bit 
(V) is 1; the modified state bit (M) is 0; and the shared state bit (S) is 0. 

[17] Cache memory 1 that has been separately managed by each processor 
according to the MESI protocol maintains memory coherency by performing a cache flush 
algorithm that writes a cache block in the modified state (i.e., the dirty state) in the main 
memory again when a certain event of multi-processor system occurs. The procedure is if 
the certain event happens, each of the processors 5 performs an exception routine associated 
with the certain event. The cache flush algorithm is performed at an appropriate moment in 
the middle of the exception routine. By loading a continuous memory area amounting to 
two times of level 2 cache memory size, the cache flush algorithm is performed for modified 
cache block of level 1 data cache memory and level 2 cache memory. 

[18] An event that needs cache flush is generally emergent and therefore the 
process of the event needs prompt attention. However, because all processors connected to 
the processor bus perform memory reads as large as the level 2 cache memory size at the 
same time, loads of processor bus increase unnecessarily. Further, practical cache flush 
algorithm is performed within a time period after the certain events happen because the 
cache flush algorithm is performed by an exception routine of each processor. Thus, there 
can be a problem in that cache flush algorithm cannot be performed promptly. 

[19] Additional advantages, objects, and features of the invention will be set forth 
in part in the description which follows and in part will become apparent to those having 
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ordinary skill in the art upon examination of the following or may be learned from practice 
of the invention. The objects and advantages of the invention may be realized and attained 
as particularly pointed out in the appended claims. 

SUMMARY OF THE INVENTION 

[20] An object of the invention is to solve at least the above problems and/or 
disadvantages and to provide at least the advantages described hereinafter. 

[21] Another object of the present invention is to provide a multi-processor 
system and methods thereof that reduce or minimize loads of a processor bus. 

[22] Another object of the present invention is to provide a multi-processor system 
and methods thereof that provide a high speed and automated cache flush operation in 
response to an event in a multi-processor system. 

[23] Another object of the present invention is to provide a multi-processor system 
and methods thereof that reduce or minimize loads of a processor bus by performing a 
bounded memory read (e.g., at most level 2 cache memory size) of each processor. 

[24] Another object of the present invention is to provide simultaneousness of 
cache flush against a certain event by directly triggering a cache flush process using the 
certain event. 

[25] In order to achieve at least the above objects, in whole or in parts, there is 
provided a cache flush system including a valid array unit configured to provide cache block 
information for an update algorithm and index information for a cache flush algorithm of at 
least one cache block in a prescribed state, a storage unit configured to store tags and 
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provide match address information for the update algorithm and tag information for the 
cache flush algorithm, a bus snooper configured to perform the update algorithm for the tag 
storage unit and the valid array unit by monitoring a processor bus and by tracing a state of 
each cache memory and a cache flush unit configured to detect a system event, to perform 
the cache flush algorithm for corresponding cache blocks in the prescribed state. 

[26] To further achieve at least the above objects, in whole or in parts, there is 
provided a cache flush method including updating status information by monitoring a 
transaction of a processor bus and tracing states of cache memory corresponding to each 
processor and flushing at least one cache block in a prescribed state among the cache blocks 
by detecting a prescribed event, generating a read transaction using the status information 
and outputting the generated read transaction. 

[27] To further achieve at least the above objects, in whole or in parts, there is 
provided A multi processor system that includes a plurality of processors coupled to a 
processor bus, at least one cache memory coupled to each processor, a memory controller 
coupled to the processor bus and a cache flush system coupled to the processor bus, 
wherein the cache flush system including a first unit configured to provide cache status 
information, a second unit coupled to the first unit and configured to update the cache status 
information and a third unit coupled to the first unit and configured to detect system events 
to perform cache flushing for corresponding cache blocks in a prescribed state responsive to 
the detected event. 

[28] Additional advantages, objects, and features of the invention will be set forth 
in part in the description which follows and in part will become apparent to those having 



7 



ordinary skill in the art upon examination of the following or may be learned from practice 
of the invention. The objects and advantages of the invention may be realized and attained 
as particularly pointed out in the appended claims. 



BRIEF DESCRIPTION OF THE DRAWINGS 

[29] The invention will be described in detail with reference to the following 
drawings in which like reference numerals refer to like elements wherein: 
[30] Figure 1 illustrates a related art multi-processor system; 
[31] Figure 2 illustrates an exemplary level 1 data cache memory structure; 
[32] Figure 3 illustrates a MESI protocol; 

[33] Figure 4 illustrates an exemplary embodiment of a multi-processor system 
according to a preferred embodiment of the present invention; 

[34] Figure 5 illustrates an exemplary embodiment of a cache flush system in 
Figure 4 according to a preferred embodiment of the present invention; 

[35] Figure 6a illustrates an exemplary 3-dimensional structure of valid array unit in 
Figure 5; 

[36] Figure 6b illustrates an exemplary 2-dimensional structure of valid array unit in 
Figure 5; 

[37] Figure 7 illustrates an exemplary DCAND in Figure 6a; 
[38] Figure 8 illustrates an exemplary DCOR in Figure 6a; 
[39] Figure 9 illustrates an exemplary tag storage unit in Figure 5; 
[40] Figure 10 illustrates an exemplary address mapping in Figure 9; 



[41] Figure 11 illustrates an exemplary embodiment of a cache flush method 
according to another preferred embodiment of the present invention; 

[42] Figure 12 illustrates an exemplary renewal procedure in Figure 11; 
[43] Figure 13 illustrates an exemplary placement operation; 
[44] Figure 14 illustrates an exemplary cache flush procedure in Figure 11; and 
[45] Figure 15 illustrates an exemplary address arrangement operation. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

[46] Figure 4 is a diagram that shows a multi-processor system according to one 
exemplary embodiment of the present invention. As shown in Figure 4, a multi-processor 
system according to the present invention can include a copy back type cache memory 6 
having a bus snoop mechanism and includes a plurality of processors 7 coupled to a 
processor bus 12, a memory controller 8 coupled to the processor bus 12, a main memory 9 
under control of the memory controller 8, other system resources 1 1 coupled to the 
processor bus 12 and a cache flush system 100 coupled to the processor bus 12. Figure 4 
illustrates four processors 7 (e.g., processor 0 ~ processor 3) and 4 cache memories 6 (e.g., 
cache memory 0 ~ cache memory 3) according to the number of processors. However, the 
present invention is not intended to be so limited. 

[47] Figure 5 is a diagram that shows an exemplary embodiment of the cache flush 
system 100. As shown in Figure 5, the cache flush system 100 can perform cache flush 
algorithm against a certain or prescribed event and manage states of each cache memory 6 by 
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tracing the states thereof. The cache flush system 100 can include a valid array unit 30, a bus 
snooper 10, a tag storage unit 20 and a cache flush unit 40. 

[48] The valid array unit 30 will now be described. Figure 6a is a diagram that 
illustrates 3-dimensional structure of the valid array unit in Figure 5, and Figure 6b is a 
diagram that illustrates 2-dimensional structure of the valid array unit in Figure 5. 

[49] The valid array unit 30 can include a Divide & Conquer AND tree (DCAND) 
31 and a Divide & Conquer OR tree (DCOR) 32. The valid array unit 30 preferably 
provides the bus snooper 10 with cache block information for an update method or 
algorithm for a cache block in a valid state, (i.e., an exclusive state or a modified state) and 
provides the cache flush unit 40 with index information for a cache flush method or 
algorithm. 

[50] More specifically, the cache flush system 100 can perform cache flush for 
cache block in the exclusive state or modified state in cache memory 6 when a certain event 
(i.e., prescribed events) happens. The valid array unit 30 can be used to effectively manage 
the performance of the cache flush process. 

[51] The valid array unit 30 can include a valid array. The valid array is 3- 
dimensional array whose elements can be valid bits indicating whether corresponding cache 
block is in an exclusive state or a modified state, that is a valid state and one axis of the valid 
array is the processor 7. Thus, a 2-dimensional valid array can exist for each processor as 
illustrated in Figure 6b. Figure 6a illustrates an example of 3-dimensional structure of valid 
array for one processor 7. As shown in Figure 6a, one rectangular object configured by the 
horizontal axis (e.g., index 0 -index 7) and the vertical axis (e.g., 0x0000~0x7FFF), that is 
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one valid array, indicates one processor 7. If the number of processors is 4, the number of 
corresponding rectangular object, that is valid array, should be 4. Further, Figure 6b 
illustrates an example of 2-dimensional structure of valid array for four processors 7. Each 
processor 7 includes 8 cache blocks and each cache block includes a plurality of valid bits. 
However, the present invention is not intended to be so limited. 

[52] If valid bit is a first preset value, e.g., T, the cache block is in a valid state, so 
that the cache block should be cleared. Thus, the cache flush process is performed. In 
contrast, if valid bit is a second preset value, e.g., c 0', it means that the cache block is an 
invalid state, so that it does not need to clear the cache block. Thus, the cache flush process 
is not performed. 

[53] Further, the valid array unit 30 can implement logic trees, such as DCAND 31 
and DCOR 32, in order to perform an update process and a cache flush process effectively. 
Figures 7 and 8 respectively illustrate an exemplary DCAND 31 and simplified DCOR 32 of 
a certain cache block of Figure 6a. The DCAND 31 illustrated in Figure 7 is directed to or 
about all valid bits for one cache set index and can provide the bus snooper 10 with cache 
block information. The DCOR 32 illustrated in Figure 8 is an example of valid bit having a 
5 bit index among 15 bits indexing one cache block and can provide the cache flusher 41 
with index information. 

[54] Figure 9 is a diagram that illustrates an exemplary tag storage unit of Figure 5. 
Figure 10 is a diagram that illustrates an exemplary address mapping of Figure 9. 

[55] The tag storage unit 20 can include a tag RAM unit 21 and a match logic unit 
22, and the tag storage unit 20 preferably stores tags and provides the bus snooper 10 with 
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match address information for the update process and the cache flush unit 40 with tag 
information for the cache flush process. The tag RAM unit 21 can store a certain number of 
tags for a certain index and the match logic unit 22 can output the match address 
information by interfacing with the tag RAM unit 21 and provide the bus snooper 10 with 
whether processor bus address matches tag of the corresponding address index. The match 
logic unit can 22 output the tag information and provides the cache flush unit 40 with tag 
making up 'read' transaction address. . 

[56] The tag RAM unit 21 included in the tag storage unit can 20 store a certain 
number of tags for index and a size of the index is inversely proportional to the number of 
tags as illustrated in Figure 10. For an exemplary configuration of 32 KB 8-way set- 
associative level 1 data cache memory and 1 MB direct level 2 cache memory, indexes are 15 
bits and tags are 12 bits. 

[57] Figure 9 illustrates an exemplary configuration of the tag storage unit 20. The 
tag RAM unit 21 can include 8 tag RAMs (e.g., tag RAM 0 -tag RAM 7) for each processor 
7. The match logic unit 22 can be directly coupled to the tag RAM unit 21 and include 
match logics (e.g., match logic 0 ~ match logic 7) separately interfacing with each tag RAM. 
Thus, the match logic unit 22 can update contents of corresponding tag RAM and extract 
address for the cache flush process. Further, the match logic unit 22 preferably determines 
whether an address that matches each cache memory 6 among addresses where transaction is 
currently going on at processor bus 12 exists and notifies the bus snooper 10 of whether the 
address exists. Thus, updating tag RAM can be normally performed. 
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[58] In other words, in Figure 9, address A (12:26) among transaction addresses is 
inputted into tag RAM of the tag RAM unit 21 and address A (0:11) is inputted into match 
logic of the match logic unit 22. For example, match logic 0 outputs to match (0,0) whether 
to be matched with tag RAM 0, match logic 1 outputs to match (0,1) whether to be matched 
with tag RAM 1, ... , and match logic 7 outputs to match (0,7) whether to be matched with 
tag RAM 7. Preferably, if T is outputted as a result of OR operating the match outputs, the 
match logic unit 22 notifies the bus snooper 10 that the address matches. Further, the tag 
RAM unit 21 receives index from the cache flush unit 40 and match logic unit 22 provides 
the cache flush unit 40 with tag information corresponding to the index. 

[59] The bus snooper 10 will now be explained. The bus snooper 10 can perform 
ah update algorithm or method for the tag storage unit 20 and the valid array unit 30 by 
monitoring a corresponding processor bus (e.g., preferably while being directly coupled to 
the processor bus 12) and by tracing a state of each cache memory 6 by update algorithm or 
method. Thus, the bus snooper 10 can perform an update process for the tag storage unit 
20 and the valid array unit 30. The bus snooper 10 helps precise update by performing 
placement algorithm or process using cache block information of the DCAND 31 when 
performing the update process for the tag RAM unit 21. 

[60] The cache flush unit 40 will now be described. The cache flush unit 40 can 
detect a certain event for system, generate a 'read' transaction that makes each processor 7 
perform the cache flush process for cache blocks in a valid state by using its own cache flush 
process and output the 'read 5 transaction. 
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[61] The cache flush unit 40 can include an event detector 42, a cache flusher 41 
and a cache bus master 43. The event detector 42 preferably detects whether a certain event 
for system occurs and the cache flusher 41 generates a 'read' transaction according to the 
index information and the tag information to which a corresponding address is mapped by 
performing cache flush process. The cache bus master 43 can output the generated 'read' 
transaction to the processor bus 12 and transfer the 'read' transaction to each processor 7. 
Thus, cache flush process for the cache memory 6 may be performed. 

[62] The cache flusher 41 can extract the index by performing address arrangement 
algorithm or process (e.g., Address Arrangement Algorithm) using index information of the 
DCOR 32, obtains tag information from the match logic unit 22 by providing the tag RAM 
unit 21 with the index and maps the address by incorporating the index and the tag. 

[63] Figure 11 illustrates an embodiment of a cache flush method according to the 
present invention. The embodiment of a cache flush method shown in Figure 11 can be 
applied to and will be described using a system shown in Figure 5. However, the present 
invention is not intended to be so limited. 

[64] As shown in Figure 11, after a process starts, first, the bus snooper 10 
performs update algorithm for tag RAMs of the tag storage unit 20 and valid arrays of the 
valid array unit 30 by monitoring a transaction of corresponding processor bus 12 while 
being directly connected to the processor bus 12 and by tracing a state of each cache 
memory 6 corresponding to each processor (block S10). Preferably, the bus snooper 10 is 
independently operating. 
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[65] Further, the cache flush unit 40 performs a cache flush process for cache 
block(s) in a valid state in the cache memory 6 by detecting a certain event, generating 'read 5 
transaction using the tag RAM and valid array and outputting the 'read 5 transaction (block 
S20). From block S20, the process can end. 

[66] Figure 12 illustrates an exemplary updating procedure in Figure 11. As shown 
in Figure 12, after a process starts, the bus snooper 10 monitors in real time whether new 
transaction is started on the processor bus 12 (block SI 01). 

[67] In a case where new transaction is started as a result of the monitoring, the 
bus snooper 10 can determine whether an attribute of the transaction is c read 5 by extracting 
the attribute of the transaction (block S102). 

[68] In a case where the attribute is c read 5 after the determining, the bus snooper 10 
can determine whether share is asserted from processors (for example, processor 1, 
processor 2 and processor 3) except for current transaction master processor 7 (for example, 
processor 0) (block SI 03). 

[69] In a case where share is asserted from the processors except for current 
transaction master processor as a result of the determining (block SI 03), whether a 
processor whose address matches the share assertion exists or not can be determined (block 
SI 04). In a case where such processor does not exist, the bus snooper 10 preferably stops its 
own operation. In contrast, in a case where such processor exists, the bus snooper 10 does 
not set a valid bit of the transaction master processor 7 from an invalid state (e.g., the valid 
bit=0) to a valid state (e.g., the valid bit=l) and clears a valid bit of the processor whose 
address is matched with the share assertion from a valid state into an invalid state (block 



15 



SI 05). In other words, cache block existing in the cache memory 6 is in a shared state, so 
that it does not need to perform cache flush algorithm even though system event occurs. 

[70] On the other hand, as a result of the determining (block SI 03), in a case where 
share is not asserted, in other words, in a case where no processor caches the address, the 
valid array unit 30 selects cache block through placement algorithm preferably using 
DCAND 31 and provides the bus snooper 10 with cache block information. The bus 
snooper 10 preferably receives the cache block information by the placement algorithm 
(block SI 06), stores tags in the tag RAM unit of the tag storage unit 20 located in a 
designated position by index corresponding to the address in the selected cache block (block 
SI 07) and sets corresponding valid bit of the valid array unit 30 from an invalid state to a 
valid state (block SI 08). In other words, current transaction master processor 7 keeps 
corresponding data exclusively, so that it is needed to perform cache flush process when 
system event occurs. 

[71] Designation of the location for storing tags is preferably automatically 
performed by index corresponding to the address. In contrast, to select cache block to be 
used preferably depends on the DCAND 31. To select cache block using the DCAND 31 
can be the placement process. 

[72] An exemplary placement algorithm will now be described with reference to 
Figure 13. As shown in Figure 13, an exemplary placement algorithm can use the DCAND 
in Figure 7. 

[73] It is assumed in this specification that a direction towards cache block 0 is a 
lower direction and a direction towards cache block 7 is a higher direction. At each 
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determining step, in a case where a lower direction is selected, it means '0'. In contrast, in a 
case where a higher direction is selected, it means T. However, the present invention is not 
intended to be so limited. The placement algorithm is to trace a lower direction cache block 
under a condition that an each step output of the DCAND 31 is '0' illustrating that empty 
cache block exists in a corresponding branch. Accordingly, as illustrated in Figure 13, a 
lower direction is c 0' at a first AND determining step from a right end, so that it means c 0 y 
according to the lower direction selection. A higher direction is c 0' at a second AND 
determining step from a right end, so that it means T according to the higher direction 
selection. A higher direction is '0' at a third AND determining step from a right end, so that 
it means T according to the higher direction selection. Finally, cache block 3 corresponding 
to '01 T is selected. 

[74] On the other hand, in a case where the attribute is not a 'read' as a result of 
the determining (block SI 02), the bus snooper 10 can determine whether the attribute is 
c read-with-intent-to-modify' (block SI 09). In a case where the attribute is a 'read-with- 
intent-to-modify' as a result of the determining (block SI 09), it can be determined whether a 
processor whose address matches the 'read-with-intent-to-modify' exists (block SI 10). In a 
case where the processor exists, the bus snooper 10 clears a valid bit of the matched 
processor having a tag for the address from a valid state into an invalid state (block Sill). 
In other words, current transaction master processor 7 has an intention to change after 
caching the data, so that it becomes a changing state. Then, other processors make a cache 
state for the data invalid and the bus snooper 10 performs update algorithm, thus clears a 
valid bit of each processor whose address matches. 
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[75] Preferably, the bus snooper 10 can further perform for the transaction master 
processor 7 the receiving cache block information block SI 06, the storing tags block SI 07 
and the setting valid bits block SI 08. In other words, the bus snooper 10 updates tag RAM 
of the tag storage unit 20 and valid array of the valid array unit 30 for the processor by using 
the placement algorithm. Further, as a result of the determining (block SI 10), even in a case 
where the matched processor does not exist, the bus snooper 10 performs for the 
transaction master processor 7 the receiving cache block information block SI 06, the storing 
tags block SI 07 and the setting valid bits block SI 08. 

[76] On the other hand, in a case where the attribute is not 'read-with-intent-to- 
modify 3 as a result of the determining (block SI 09), the bus snooper 10 can determine 
whether the attribute is a 'kill' transaction (block S112). At this time, in a case where the 
attribute is c kilT transaction as a result of the determining (block SI 12), the bus snooper 10 
determines whether a processor whose address matches the 'read-with-intent-to-modify' 
exists (block SI 10) and performs the clearing valid bits (block Sill), the receiving cache 
blocks (block S106), said storing tags (block S107) and the setting valid bits as performed in 
a case where the attribute is the 'read-with-intent-to-modify\ In other words, the 'kill' 
transaction means that a current bus master processor 7 that will be able to be a transaction 
master processor intents to change cache data in a shared state and a state of cache memory 
6 of the bus master processor 7 is shifted from the shared state into the modified state. 
Accordingly, the bus snooper in a case where the attribute is 'kill 5 transaction can perform 
the same operations as operations in a case where the attribute is c read-with-intent-to- 
modify 5 . 
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[77] In contrast, in a case where the attribute is not 'kill 5 transaction as a result of 
the determining (block SI 12), the bus snooper 10 determines whether snoop, push is 
performed by being cast out through replacement operation of cache controller (block 
SI 13). In a case where snoop push is not performed by not being cast out as a result of the 
determining (block SI 13), the bus snooper can end its operation. In contrast, in a case 
where snoop push is performed by being cast out as a result of the determining (block SI 13), 
the bus snooper 10 preferably clears a valid bit of the matched processor having a tag for the 
address from a valid state into an invalid state (block SI 14). 

[78] In other words, the snoop push is preferably performed in three cases. A first 
one is a case that snoop hit is performed by 'read 5 of other bus master processor 7, a state of 
cache memory 6 is shifted to a shared state. A second one is a case that snoop hit is 
performed by write 5 or 'read-with-intent-to-modify 5 of other master processor 7, a state of 
cache memory 6 is shifted to an invalid state. The third one is a case that cast out is 
performed by replacement operation of the cache controller, a state of cache memory 6 is 
shifted to an invalid state. Accordingly, for all the cases that snoop push is performed, it is 
no need to perform cache flush algorithm for a corresponding address when system event 
occurs, so that the bus snooper 10 clears a corresponding valid bit. 

[79] By the above-described embodiment of update algorithm or process of the 
bus snooper 10, it is made possible to precisely manage each of the processors 5 cache 
memory state, in order for the cache flush unit 40 to properly perform cache flush algorithm 
when system event occurs. 
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[80] The cache flush (block S20) will now be described. Figure 14 is a flow 
diagram illustrating an exemplary cache flush procedure in Figure 11. Figure 14 illustrates 
cache block in a valid state of processor 7 in response to an event occurrence. For example, 
by looking for a processor in a valid state among processor 0~ processor 3 illustrated in 
Figure 6b and cache block in a valid state of the identified or selected processor, a 
corresponding address can be extracted and transaction can be generated/outputted to 
perform a cache flush process. 

[81] As shown in Figure 14, after a process starts an event detector 42 of the cache 
flush unit 40 notifies in real time a cache flusher 41 of whether a system event pre- 
designated to perform cache flush algorithm occurs. The cache flusher 41 receives 
notification about whether such an event occurs from the event detector 42 (block S201). 

[82] The cache flusher 41 can check whether valid array state of all processors is 
valid (e.g., whole valid array state = 1) by OR operating all valid bits of valid array for all the 
processors 7 (block S202). In a case where valid array state of all processors is valid as a 
result of the checking, the cache flusher 41 performs cache flush algorithm for cache 
memory 6 of each processor 7 from processor 0 to processor 3. The valid array state of all 
processors is obtained by OR operating all valid bits for valid array of all processors. In 
other words, if at least one valid bit is T, that is, if at least one valid bit is in a valid state, 
valid array state of all processors is valid. 

[83] Particularly, the cache flusher 41 can determine whether valid array state of a 
first processor N (N=0) is valid (e.g., valid array state (N) =1) (block S204). The valid array 
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state (N) can be obtained by OR operating final output of DCOR 32 collected according to 
each cache block. 

[84] In a case where valid array state of the processor is not valid as a result of the 
determining, that is, in a case where valid array state of the processor is invalid (e.g., valid 
array state (N)=0), the cache flusher 41 performs determination of a processor valid array 
state for a next processor N (N=N+1) (block S205). In contrast, in a case where the valid 
array state is valid (e.g., valid array state (N)=l) as a result of the determining (block S204), 
the cache flusher 41 can determine whether valid array state for a first cache block n (n=0) 
of the processor is valid (e.g., valid array state (N, n)=l) (block S207). Whether valid array 
state for a first cache block of the processor is valid can be determined by checking final 
output of DCOR 32 collected according to each cache block. 

[85] In a case where valid array state of the cache block is not valid as a result of 
the determining (block S207), that is, the valid array state is invalid (e.g., valid array state 
(N,n)=0), the cache flusher 41 performs the determining cache block valid array state (block 
S207) for a next cache block n (n=n+l) (block S208). In contrast, in a case where valid array 
state of the cache block is valid (e.g., valid array state (N, n)=l) as a result of the determining 
(block S207), the cache flusher 41 makes value extracted by address arrangement algorithm 
an index, receives tag information corresponding to the index from match logic unit 22 of a 
corresponding tag storage unit 20 by providing tag RAM unit 21 of the tag storage unit 20 
with the index and extracts a corresponding cache block address designated by mapping a 
corresponding index and tags (block S209). The extracting the cache block address is 
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performed by extracting address corresponding to valid bit in a valid state among valid array 
of a corresponding cache block. 

[86] The cache flusher 41 can generate a 'read 5 transaction for cache block 
corresponding to the extracted cache block address and output the 'read' transaction 
through the cache bus master 43 to processor bus 12. Thus, a processor 7, corresponding to 
a cache memory to which a corresponding cache block belongs, performs cache flush 
process for cache block of the corresponding cache memory 6 (block S210). Generation of 
a 'read' transaction for the cache block is particularly performed by generating a 'read' 
transaction corresponding to extracted address for a valid bit in a valid state among valid 
array of the cache block. 

[87] The address arrangement algorithm can be implemented by using DCOR 32. 
An exemplary address arrangement algorithm will be now described. Figure 15 illustrates an 
exemplary address arrangement algorithm using DCOR illustrated in Figure 8. 

[88] It is assumed in this specification that a direction towards an index 'ObOOOOO' 
is a lower direction and a direction towards an index 'Obi 11 11' is a higher direction. At each 
determination, in a case where a lower direction is selected, it means '0'. In contrast, in a 
case where a higher direction is selected, it means T. The address arrangement algorithm 
can trace a lower direction index under a condition that an each determination output of the 
DCOR 32 is T illustrating that occupied cache block exists in a corresponding branch. 
Accordingly, as illustrated in Figure 15, a lower direction is T at a first OR determination 
from a right end, so that it means '0' according to the lower direction selection. A lower 
direction is T at a second OR determination from a right end, so that it means '0' according 
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to the lower direction selection. A higher direction is T at a third, a fourth and a fifth OR 
determination from a right end, so that it means T according to the higher direction 
selection. Finally, it means c 001 1 1', so that an index 'ObOOl 1 V is selected. 

[89] Then, the cache flusher 41 can determine whether the cache flushed cache 
block is a final cache block among cache blocks that belong to cache memory 6 
corresponding to the processor 7, which is an object of the 'read' transaction (block S211). 
A final cache block in this case means a 8th cache block ((M,N)=(M,7), At this time, 
0<M<3) when the number of processors 7 that is an object of the 'read' transaction is 4 and 
the number of cache blocks that belongs to cache memory 6 corresponding to each 
processor is 8. However, the present invention is not intended to be so limited. 

[90] In a case where the cache block is not a final cache block as a result of the 
determining (block S211), the cache flusher 41 performs the determining a valid array state 
(block S207) of a next cache block (block S208). In contrast, in a case where the cache block 
is a final cache block as a result of the determining (block S211), the cache flusher 41 can 
determine whether the processor which is an object of the c read' transaction is a final 
processor among all processors (block S212); The final processor can mean a fourth 
processor (e.g., (M,N)=(3,N), at the same time, N is 7). In a case where the processor is not 
a final processor as a result of the determining (block S212), the cache flusher 41 can 
perform said determining valid state of a next processor (block S205). In contrast, in a case 
where the processor is the final processor as a result of the determining (block S212), the 
cache flusher 41 can end cache flush process. 
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[91] As described above, the cache flusher 41 can repeat a process or the same 
algorithm from a beginning if cache flush process is completed in all cache blocks of each 
processor 7. The repetition preferably continues until each processor 7 makes a 
corresponding cache memory 6 disable. Each of the processor 7 can make a corresponding 
cache memory 6 disable on the basis of valid bit for each of the processor 7. 

[92] As described above, embodiments of a cache flush system and methods 
thereof according to the present invention have various advantages. Embodiments of the 
present invention can reduce or minimize loads of a processor bus, for example, by 
performing memory read at most as much as the size of level 2 cache memory of each 
processor. Further, embodiments can increase or assure simultaneousness of a cache flush 
for a certain event by performing cache directly triggered by the certain event. Thus, a high 
speed cache flush process can be performed. Further, a cache flush process can be 
automatically generated. 

[93] Any reference in this specification to "one embodiment," "an embodiment," 
"example embodiment," etc., means that a particular feature, structure, or characteristic 
described in connection with the embodiment is included in at least one embodiment of the 
invention. The appearances of such phrases in various places in the specification are not 
necessarily all referring to the same embodiment. Further, when a particular feature, 
structure, or characteristic is described in connection with any embodiment, it is submitted 
that it is within the purview of one skilled in the art to effect such feature, structure, or 
characteristic in connection with other ones of the embodiments. Furthermore, for ease of 
understanding, certain method procedures may have been delineated as separate procedures; 



24 



however, these separately delineated procedures should not be construed as necessarily order 
dependent in their performance. That is, some procedures may be able to be performed in 
an alternative ordering, simultaneously, etc. 

[94] The foregoing embodiments and advantages are merely exemplary and are not 
to be construed as limiting the present invention. The present teaching can be readily 
applied to other types of apparatuses. The description of the present invention is intended 
to be illustrative, and not to limit the scope of the claims. Many alternatives, modifications, 
and variations will be apparent to those skilled in the art. In the claims, means-plus- function 
clauses are intended to cover the structures described herein as performing the recited 
function and not only structural equivalents but also equivalent structures. 
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